Evaluation of an LSTM-RNN System in Different NIST Language Recognition Frameworks

نویسندگان

  • Ruben Zazo
  • Alicia Lozano-Diez
  • Joaquin Gonzalez-Rodriguez
چکیده

Long Short-Term Memory recurrent neural networks (LSTM RNNs) provide an outstanding performance in language identification (LID) due to its ability to model speech sequences. So far, previously published LSTM RNNs solutions for LID deal with highly controlled scenarios, balanced datasets and limited channel variability. In this paper we evaluate an endto-end LSTM LID system, comparing it against a classical ivector system, on different environments based on data from Language Recognition Evaluations (LRE) organized by NIST. In order to analyze the behavior we train and test our system on a balanced and controlled subset of LRE09, on the develompent data of LRE15 and, finally, on the evaluation set of LRE15. Our results show that an end-to-end recurrent system clearly outperforms the reference i-vector system in a controlled environment, specially when dealing with short utterances. However, our deep learning approach is more sensitive to unbalanced datasets, channel variability and, specially, to the mismatch between development and test datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition

Recently, the integration of deep neural networks (DNNs) trained to predict senone posteriors with conventional language modeling methods has been proved effective for spoken language recognition. This work extends some of the senone-based DNN frameworks by replacing the DNN with the LSTM RNN. Two of these approaches use the LSTM RNN to generate features. The features are extracted from the rec...

متن کامل

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources...

متن کامل

Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditions

This work explores the use of DNN/RNN for extracting Baum-Welch sufficient statistics in place of the conventional GMM-UBM in speaker recognition. In this framework, the DNN/RNN is trained for automatic speech recognition (ASR) and each of the output unit corresponds to a component of GMM-UBM. Then the outputs of network are combined with acoustic features to calculate sufficient statistics for...

متن کامل

Spoken Language Identification Using LSTM-Based Angular Proximity

This paper describes the design of an acoustic language identification (LID) system based on LSTMs that directly maps a sequence of acoustic features to a vector in a vector space where the angular proximity corresponds to a measure of language/dialect similarity. A specific architecture for the LSTMbased language vector extractor is introduced along with the angular proximity loss function to ...

متن کامل

Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition

Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However, conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (A-LSTM), for better temporal context modeling. We empl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016